September 6 2019

Agenda for day two:

  • Pros and Cons of R graphics, lattice, and GGplot.
  • Bar plots.
  • Box plots.
  • Histograms/density plots.
  • Themes.
  • Combine multiple plots onto one page (ggarrange).
  • Plotting data using Google maps (ggmap).

Pros and Cons of Base R, Lattice and ggplot

Pros and Cons of Base R, Lattice and ggplot

Base R and lattice are two different approaches to data visualization in R:

  • Base R
    • Great if all you need is a quick and simple plot.
    • However, anything more complex becomes tedious to build.
  • Lattice
    • It's language is very similar to writing models in R.
    • Also straight forward to create subplots (facets, as we are used to).
    • However, detailed customization is either very tedious, or impossible.
  • GGplot
    • Detailed customization is straight forward due to "verb" functions, as per the tidyverse standard.
    • Includes pre built statistical operations to give more meaning to your plots.
    • However, has a steeper learning curve that can be overwhelming at first.

Example 1

Example 2

Example 3

Question 1

What aesthetic would you set if you wanted to change the color of a bar plot?

  • A.) color
  • B.) shape
  • C.) size
  • D.) fill

You may use Google to find the answer

Answer 1

The correct choice is D.)

color only changes the border of the bars.

Recreating Example 1

Recreating Example 1

Recreating Example 1

Calculate means manually before plotting.

# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>% 
  filter(!(admin_source=="Missing" | admin_source=="Other")) %>% 
  group_by(admin_type,admin_source) %>% 
  summarise(mean_med = mean(num_medications))

Recreating Example 1

Start with the basic plot.

  • geom_bar() plots a bar plot.
  • stat="identity" uses the value of y, rather than the count.
  • position = position_dodge() causes the bars to be placed next to each other.

https://ggplot2.tidyverse.org/reference/geom_bar.html

# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>% 
  filter(!(admin_source=="Missing" | admin_source=="Other")) %>% 
  group_by(admin_type,admin_source) %>% 
  summarise(mean_med = mean(num_medications))

#ploting
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) + 
              ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
              xlab("Admission Type") + ylab("Mean Number of Medications") +
              geom_bar(stat="identity", position = position_dodge())

Recreating Example 1

Use scale_fill_manual() to manually change the fill properties

  • This is the same process that we are use to.
# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>% 
  filter(!(admin_source=="Missing" | admin_source=="Other")) %>% 
  group_by(admin_type,admin_source) %>% 
  summarise(mean_med = mean(num_medications))


#ploting 
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) + 
              ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
              xlab("Admission Type") + ylab("Mean Number of Medications") +
              geom_bar(stat="identity", position = position_dodge()) +
              scale_fill_manual(values=c("firebrick3","seagreen4","purple3","steelblue3"), 
                                name="Admission Source")

Recreating Example 1

Add the means for each bar via geom_text().

  • Must set a separate aes()
    • label is the data that is going to be turned to text
  • vjust offsets the vertical position of all text
  • position = position_dodge() causes the text to appear beside each other.

https://ggplot2.tidyverse.org/reference/geom_text.html

# Using dyplr to obtian means by admin type and admin source
DIABETES_MED_MEANS <- DIABETES_CLASS %>% 
  filter(!(admin_source=="Missing" | admin_source=="Other")) %>% 
  group_by(admin_type,admin_source) %>% 
  summarise(mean_med = mean(num_medications))
#ploting 
barplot_exp <- ggplot(DIABETES_MED_MEANS,aes(x=admin_type,y=mean_med,fill=admin_source)) + 
              ggtitle("Type of Admission Versus Mean Number of Medications Taken
Grouped by Admission Source") +
              xlab("Admission Type") + ylab("Mean Number of Medications") +
              geom_bar(stat="identity", position = position_dodge()) +
              scale_fill_manual(values=c("firebrick3","seagreen4","purple3","steelblue3"),
                                name="Admission Source") +
              geom_text(aes(label=round(mean_med,1)),
                        vjust=1, position = position_dodge(.9),size=3, color="white")

Exercise 1

Exercise 1

  • 1.) Use group_by() and summarise() to calculate the mean birth weight by race and smoking status.
  • 2.) Use ggplot() and geom_bar() to create a basic bar plot of race vs mean birth weight at first birth.
  • 3.) Add a title and axis labels.
  • 4.) Change the fill of the bars by smoking status set manual fills.
  • 5.) (Bonus) use geom_text() to add the means to each bar.
  • 6.) (Bonus) use scale_x_discrete to clean the x-axis tick labels

Recreating Example 2

Recreating Example 2

Recreating Example 2

First we will filter out missing and other races.

After that the usual parameters are set

  • geom_boxplot() plots the box plot
  • outlier.shape=NA removes outliers from the plot.

https://ggplot2.tidyverse.org/reference/geom_boxplot.html

#Filter
DIABETES_FILTER <- DIABETES_CLASS %>% filter( !(race=="Other" | race=="Missing") )

#ploting 
boxplot_exp <- ggplot(DIABETES_FILTER,aes(x=race,y=time_in_hospital,fill=sex)) +
              ggtitle("Box Plots of Race Versus Time Spent in Hospital Grouped by Sex") +
              xlab("Race") + ylab("Time Spent in Hospital (Hours)") +
              geom_boxplot(outlier.shape=NA) +
              scale_fill_manual(values = c("orchid3","deepskyblue3"),
                                name="Sex")

Exercise 2

Exercise 2

  • 1.) Use ggplot() and geom_boxplot() to create a basic box plot of race vs material weight at first birth.
  • 2.) Add a title and axis labels.
  • 3.) Change the fill of the boxes by history of hypertension and set manual fills.
  • 4.) Use scale_x_discrete to clean the x-axis tick labels
  • 5.) (Bonus) Show the outliers and change their ascetics
  • (Hint) outlier.shape is one of the parameters you can set within geom_boxplot(), try googling the other options.

Recreating Example 3

Recreating Example 3

Recreating Example 3

Plot the histogram via geom_histogram()

  • Here we set both color and fill to map to sex
  • mapping = aes(y=stat(density)) turns y into a density rather than a count.
  • binwidth controls the width of each bar.
  • position="identity" causes the histograms to be plotted in front each another.
  • alpha controls the transparency of the histograms.

https://ggplot2.tidyverse.org/reference/geom_histogram.html

density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
  ggtitle("Distribution of Weight by Sex") +
  xlab("Weight (lbs.)") + ylab("Density") + 
  geom_histogram(mapping = aes(y=stat(density)),
                 binwidth=5,
                 position="identity",
                 alpha=.1) 

Recreating Example 3

Set the aesthetics manually.

  • We will need to do this twice:
  • Once for the fill via scale_fill_manual()
  • Once for the color via scale_color_manual()
  • However, the way we use these functions are the exact same as before.
density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
              ggtitle("Distribution of Weight by Sex") +
              xlab("Weight (lbs.)") + ylab("Density") + 
              geom_histogram(mapping = aes(y=stat(density)),
                             binwidth=5,
                             position="identity",
                             alpha=.1) +
              scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
              scale_color_manual(values = c("deeppink4","deepskyblue4"))

Recreating Example 3

Plot the density over each histogram via geom_density()

https://ggplot2.tidyverse.org/reference/geom_density.html

  • This works automatically because we set the mapping in geom_histogram()
density_exp <- ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
              ggtitle("Distribution of Weight by Sex") +
              xlab("Weight (lbs.)") + ylab("Density") + 
              geom_histogram(mapping = aes(y=stat(density)),
                             binwidth=5,
                             position="identity",
                             alpha=.1) +
              scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
              scale_color_manual(values = c("deeppink4","deepskyblue4")) +
              geom_density(alpha=.1)

Exercise 3

  • 1.) Use ggplot() and geom_density() to create a basic density plot of maternal age at second birth.
  • 2.) Add a title and axis labels.
  • 3.) Change the fill of the curves by History of Pre-term Labor at Second Birth and set manual fills.
  • 4.) Change the color of the curves by History of Pre-term Labor at Second Birth and set manual colors.

Themes

Themes

Recall from yesterday some of the pre-defined themes in GGplot.

  • theme_light(), theme_bw(), ect…

Today we will take a look at the theme() function and how it can be used to modify individual parts of a theme.

Themes

Like all other functions, theme() has many different parameters we can set.

For example:

  • legend.background changes the appearance of the background for all legends.
  • legend.spacing changes the spacing between legends
  • legend.margin changes the dimensions of the legend margins
  • axis.title changes the appearance of the axis labels title
  • axis.ticks changes the appearance of both axis tick marks
  • axis.ticks.length changes the length of both axis tick marks.
  • plot.title changes the appearance of the plot title
  • plot.background changes the appearance of the entire plot background
  • strip.text changes the appearance of facet labels

For a complete list:

https://ggplot2.tidyverse.org/reference/theme.html

Themes

The tricky part is that some of these parameters requires you to define a element_ object.

  • element_rect() defines a rectangle
  • you can set the fill, color, size, and linetype.
  • element_line() defines a line
  • you can set the color, size, linetype, and lineend
  • element_text() defines text
  • you can set the family (font type), face (plain, italic, bold, bold.italic), color, size, vjust, hjust, and angle.
  • margin() defines a margin
  • you can set the dimensions of each margin: t,r,b,l (top,right,bottom,left)
  • unit() defines a unit
  • you set x (the number) and units (cm, inches, mm, points, ect…)

Themes

Themes

Lets use the third example to test out theme().

Say I would like to make the following 5 changes to the theme of the plot:

  • Change the background of the legend so that the border is blue.
  • Change the plot title to have bold text and in the Times New Roman font.
  • Change the axis ticks to be red.
  • Change the axis ticks to be .25 centimetres long.
  • Change the legend margins to be .5 cm long the top and bottom and 1 cm long on the sides.

Then the call to theme would look like this:

theme(legend.background = element_rect(color="blue"),
      plot.title = element_text(family = "serif", face = "bold"),
      axis.ticks = element_line(color="red"),
      axis.ticks.length = unit(.25, "cm"),
      legend.margin = margin(t=.5,b=.5,r=1,l=1,"cm"))

Themes

  ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
  ggtitle("Distribution of Weight by Sex") +
  xlab("Weight (lbs.)") + ylab("Density") + 
  geom_histogram(mapping = aes(y=stat(density)),
                 binwidth=5,
                 position="identity",
                 alpha=.1) +
  scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
  scale_color_manual(values = c("deeppink4","deepskyblue4")) +
  geom_density(alpha=.1) +
  #Theme modifications  
  theme(legend.background = element_rect(color="blue"),
        plot.title = element_text(family = "serif", face = "bold"),
        axis.ticks = element_line(color="red"),
        axis.ticks.length = unit(x=.25, unit="cm"),
        legend.margin = margin(t=.5,b=.5,r=1,l=1,unit="cm"))

Themes

Question 2

What parameter would you set in theme() if you wanted to place the legend at the top of the plot. How would you set this parameter?

Hint: this link lists all the settable parameters within the theme() function as well as the requirements for setting them.

https://ggplot2.tidyverse.org/reference/theme.html

Answer 2

The correct call to theme() would look like this:

theme(legend.position = "top")

#Get mean weight for each sex.
DIABETES_WEIGHT_MEANS <- DIABETES_CLASS %>% group_by(sex) %>% 
  summarise(mean_weight = mean(weight,na.rm=TRUE))

 #ploting 
  ggplot(DIABETES_CLASS,aes(x=weight,fill=sex,color=sex)) +
  ggtitle("                                  Distribution of Weight by Sex") +
  xlab("Weight (lbs.)") + ylab("Density") + 
  geom_histogram(mapping = aes(y=stat(density)),
                 binwidth=5,
                 position="identity",
                 alpha=.1) +
  scale_fill_manual(values = c("deeppink1","deepskyblue1")) +
  scale_color_manual(values = c("deeppink4","deepskyblue4")) +
  geom_density(alpha=.1) +
  geom_vline(data=DIABETES_WEIGHT_MEANS, 
             aes(xintercept=mean_weight,color=sex),
             linetype="dashed") +
  #Theme modifications  
  theme(legend.position = "top")

ggarrange

ggarrange

ggarrange() is a function that allows us to arrange multiple ggplots on the same page.

You would usually do this when you have multiple plots that show different parts of a relationship.

Today we will use the plots that we made.

ggarrange

ggarrange

For example, if we wanted to arrange the bar plot and box plot from the examples such that the plots are placed above one another,

then the call to ggarrange would look like this:

ggarrange(barplot_exp,boxplot_exp, ncol = 1, nrow = 2)

Exercise 4

Arrange two of the three plots you created today.

  • 1.) Decide which of the two plots you would like to plot.
  • 2.) Decide if you would like them side by side or stacked.
  • 3.) (Bonus) add labels to each plot.

ggmap

ggmap

ggmap is a package that allows you to obtain map data from Google maps and plot it using the ggplot framework.

Unfortunately, due to recent changes to the Google API, using ggmap means you must have a API key.

Moreover, users are required to enter valid credit card information to resister for a API key.

This means that we wont be able to do interactive examples in ggmap.

Nonetheless, we can still look at some examples.

ggmap

This example shows densities of lightning strikes in Houston.

ggmap

This plot relies on latitude and longitude data collected from the World Wide Lightning Location Network.

head(lightning_raw)
 lat     lon
 

1 29.775 -94.649

2 30.240 -94.270

3 29.803 -94.418

4 29.886 -94.342

5 29.892 -94.085

6 29.898 -94.071

ggmap

Obtaining the map of Huston requires knowing the latitude and longitude coordinates.

library(ggmap)
houston <- c(lon = -95.36, lat = 29.76)
houston_map <- get_map(location = houston, zoom = 8, color = "bw")

ggmap

Now we can use ggmap() in replacement of ggplot()

  • maprange=FALSE indicates that the map shall not define the x and y limits for the entire plot.

Otherwise we build the ggplot as we are use to.

ggmap(houston_map, maprange=FALSE) +
  stat_density_2d(data = lightning_raw, 
                  aes(x = lon, y = lat, fill = ..level.., alpha = ..level..),
                  color="blue",size = 0.01, bins = 16, geom = 'polygon') +
  scale_fill_gradient(low = "green", high = "red") +
  scale_alpha(range = c(0.00, 0.25), guide = FALSE) +
  theme(legend.position = "none", 
        axis.title = element_blank(), 
        text = element_text(size = 12))

ggmap

ggmap

More examples

ggmap

Take Home Exercise

R for Data Science is an excellent resource for learning more about ggplot.

We recommend you try the following exercises as homework:

  • Exercise 3.6.1 (Questions 2,4-6)